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ABSTRACT 

The aim of this paper is to explain how the D-iteration can 
be used for an efficient asynchronous distributed computa- 
tion. We present the main ideas of the method and illustrate 
them through very simple examples. 

Categories and Subject Descriptors 

G.l.O [Mathematics of Computing]: Numerical Anal- 
ysis — Parallel algorithms; G.1.3 [Mathematics of Com- 
puting]: Numerical Analysis — Numerical Linear Algebra 

General Terms 

Algorithms, Performance 

Keywords 

Distributed computation, Iteration, Fixed point. Eigenvec- 
tor. 

1. INTRODUCTION 

As an improved or alternative solution to existing iterative 
methods (cf. [2l|6l[l]), the D-iteration algorithm has been 
proposed in [3] in a general context of linear equations to 
solve X (vector of size A'^) such that: 



X = P.X + B. 



(1) 



where P is a square matrix of size N x N and B a vector of 
size N. In particular, it has been shown how this iterative 
method can be further applied to solve X such that 

Q.X = X and R.X = B 

where Q and R are square matrices of size A'' x A or to solve 

A.X = B 

where ^4 is a square matrix of size N x N. 

We recall that the D-iteration approach works when the 
spectral radius of P is strictly less than 1 and that it basi- 
cally consists in computing efficiently the solution X of the 
equation ^ using the power series X = X^J^Lo P"B. 

2. EQUATION ON Hm 

The fluid diffusion model is in the general case described 
by the matrix P associated with a weighted graph {pij is 
the weight of the edge from j to i) and the initial condition 
Fo = B. 



We recall the definition of the two vectors used in D- 
iteration: the fluid vector F„ deflned by: 



F„ = {Id- -K+PJ^JFr, 



(2) 



where: 



Id is the identity matrix; 



.} with in G {1,..,A} is a deter- 



ministic or random sequence such that the number of 
occurrence of each value k € {1, .., A} in / is infinity; 

• Jfe a matrix with all entries equal to zero except for 
the fc-th diagonal term: {Jk)kk = 1- 

And the history vector II„ defined by {Ho initialized to a 
null vector): 



Hn 



(3) 



Then, we have (cf. [4]): 

H^ + F, = Fo + PHn. (4) 
It has been shown in [4] that satisfies the equation: 

= {Id-J,„{Id-P})Hr,-l + J^„Fo. (5) 

In fact, the above equation can be very easily understood 
remarking that Id — Ji„ {Id — P) is a matrix built from P 
extracting the i„-th line of P and completing the rest with 
identity line vectors on i in (zero everywhere except the 
i-th column equal to one). 

Note that for the entry i in, {Hn)i = {Hn-i)i- 

2.1 Preliminary operations 

2.1.1 Initial condition 

It is easy to see from the equation ([Sjl that when we choose 
ii = 1,42 = 2, ..,ijv = N, we obtain Hn ~ B. So we can 
directly start the iteration with Ho ~ B without any cost. 

2.7.2 Diagonal link elimination 

Now we can optionally apply the diagonal link elimination 
based on the method defined in [3]: when pu 7^ is to be 
suppressed, it implies two modifications: 

• modification of the initial fluid: replace Bi by 5^/(1 — 
Pa); 



• modification of aii fink weiglits pointing to node i (in- 
coming finks to i, namefy aff j such tfiat pij 7^ 0): this 
operation can be replaced by keeping focalfy at node 
i the information that afl incoming fluid need to be 
muftiplied by 1/(1 — pa). 

3. DISTRIBUTIVE COMPUTATION 

In the foflowing we set Li (P) the i-tii fine vector extracted 
from P: 

{U{P))i=Pij. 

We start by assuming that there is a partition oi N in K 
disjoint sets Q,i, i = 1, .., K, such that U^jSlfe = {1, .., A^}. 

The choice of the partition can be seen as an independent 
optimization task that will not be discussed here (intuitively, 
Q,k should be such that most of links are between nodes of 
the same set). 

3.1 Operations in Q,k 

We assume here that all computations of {H„)i,i £ Qk 
is handled by one independent process (or server or virtual 
machine), that we call PIDk- 

PIDk has as input B and H. H is initially set to B. 

3.1.1 Local updates 

PIDk updates H by applying the fluid diffusion model 
with in G Ofc: 

{H),„=UJP).H+{B)i„. (6) 

3.1.2 Updates sharing 

Periodically, PIDk sends to all other PIDi (i 7^ k) the 
updated {H)j^n^- When, a PIDk receives updates of {H)i 
for i £ , it updates the current H and can apply the local 
updates 

3.2 Evolution of P 

If for some reason, the matrix P is updated to a new 
matrix P' and if one is interested by the solution of ([TJ with 
P' , the new P' is sent to all PIDk that are concerned by 
the modification. 

Upon reception of this modification, each PIDk does the 
following updates: 

• store the last result H for entries i £ Q,k (can be used 
as the new initial vector Hq)\ 

• replace B hy B' = F + {P' - P)H for entries i£Q,k. 

{F)i is computed by: Li{P).H + {B)i - {H)i. 

Since each PIDk only requires the information {B)i for 
i € Q,k, we don't need to synchronize for the new B' , but 
just update B' locally and then we can re-apply the methods 
of Section O with P' . 

The above result is based on the result of Theorem 4 of 

Bl- 

3.3 Another version based on two state vectors 
(V2) 

The drawback of the above method is to have to keep 
the complete H vector for each PID. For a really very large 
matrix P this may be an issue. In such a case, we may use 
the two fluid diffusion state vectors Hn and Fn (equations 
and (O))- Then each PIDk needs to keep only locally 
the partial view: {B)i, {IIn)i and {Fn)i only for i £ ilk- 



In such a scheme, the exchanged information between 
PIDs is the quantity F„ that need to be sent /received: each 
PIDk exploits the column vector extracted from P, say 
Ci{P) for the i-th column vector {i £ Qk)- When the diffu- 
sion is applied on node in £ flk with the fluid / — (Fn-i)i„, 
the quantity / x need to be sent to a PIDk' such that 
j G ilk' , so that PIDk' can add this quantity to {Fn' )j ■ 

The fluid transmission (/ x Pji^ to all j) does not require 
any synchronization. To avoid too much information ex- 
change, the fluid transmission can be delayed and regrouped 
(we can regroup (/i-|-/2-l-..-l-/m) xPji„ so that this quantity 
is not too small; we can regroup on in as well if going to the 
same destination j): in fact, we don't need to know who sent 
the fluid. The only constraint is that the fluid transmission 
is not lost: this means that each PIDk need to keep locally 
the information of the fluid {fi+ f2 + ■■ + fm) x Pji„ until its 
destination PID {PIDk') acknowledges its reception (say as 
TCP). 

In this scheme, the convergence is explicitly monitored by 
observing the total fluid quantity (locally updated F„ plus 
all fluids being transmitted). 

4. OPTIMIZATION PROBLEM 

Given the partition set Qk, the question is when to share 
the local updates on H. Here is a first possible solution. 

4.1 Local remaining fluid 

We can define the local remaining fluid by: 
rk=Y. \U{P).H + {B),~{H),\. 

Assuming a non-negative matrix P and applying ideas of [4] , 
we could decide to share the results of the local computations 
to other PIDs when 

rk < Tk 

where Tk is the local threshold for Qk- When such a condi- 
tion is satisfied, we could then apply an update of Tk- For 
instance by a multiplicative division by factor a > 1: 

Tk ~ Tk/a. 

In the version (V2), rk is explicitly given by the norm Li 

4.2 Diffusion sequence / 

Here we need to choose the sequence order in £ fife for 
each k. By default, we can apply a cyclic order. We could 
apply also some greedy approach as in [4] |3 . Finding the 
optimal sequence or a practical sub-optimal sequence for 
each k is an open problem. 

4.3 Sharing locally updated results 

The transmission of H to other PIDs is triggered when 

• rk < Tfc , or 

• an update of H is received from another PID. 
In the version (V2), F may be sent only when: 

• rk< Tk. 

When the PIDs advance at very different speeds (moni- 
toring Tk), we can think of splitting the set Qk associated 
to the slowest PIDk or possibly regrouping flk associated 
to the fastest PIDk etc. 



4.4 Distance to the limit 

The limit is reached when X^fc '"fc =0- case of PageRank 
style equations, it has been shown in [3] that C^^k '''k) / i^ — d) 
defines an exact distance to the limit or an upper bound in 
the presence of dangling nodes. 

In the general case, the spectral radius of P plays a role 
(but is not necessarily known). For instance, if for all i, 
J2j \Pji\ < 1. then taking e = mini(l - J2j \Pji\), iJ2k 
defines an upper bound of the distance to the limit. 

5. EXAMPLES 

5.1 Example with 2 PIDs 

Let's take a simple example to illustrate the above method. 
We set: 



/5 
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3 0\ 
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\0 2 3/ 



And we look for X such that A.X = B = (1, 1, 1, 1)*. 

In this case, we defined A{1) so that they is no correlation 
between Q.\ = {1, 2} and Q.2 = {3, 4}. As expected, then the 
gain factor is about 2 (assuming no information transmission 
cost) with 2 PIDs as shown in Figure [1] in Figure [U we 
compared the Jacobi and Gauss-Seidel iterations and the 
D-iteration on P obtained from A by dividing each line by 
the diagonal term (cf. [3]): 
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For the D-iteration, we applied the cyclical sequence {1, 2, 3, 4} 
(using the equation ([5]) on Hn). For 2 PIDs case, we applied 
jointly the cyclical sequence {1,2} and {3,4} exactly twice 
before sharing the local computation results. 
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Figure 1: Example: 2 PIDs for A{1). 



Now, we set: 



^(2) 



/5 3 1 l\ 
3 7 10 
118 4 

\1 1 2 3/ 



In this case, we added values in ^(2) so that they is corre- 
lation between fii and ^2. Then there is still a visible gain 
factor as shown in Figure [2l 
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Figure 2: Example: 2 PIDs with correlation for yl(2). 
Finally, we set: 



A(3) = 



/5 3 1 l\ 
3 7 11 
118 4 

\1 1 2 3/ 



In this case, we added 1 on (yl(3))2,4. Then there is no 
longer any significant gain as shown in Figure |3] 
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Figure 3: Example: 2 PIDs with correlation for ^4(3). 

5.2 Example of A updates with 2 PIDs 

We set: 



and 



A' ^ 



/5 


3 







3 


7 





SI 








8 


4 


^0 





2 


3/ 


/5 


3 





'\ 


3 


7 





1 








8 


4 


\o 





2 


3/ 



Then P and P' are defined by: 



P = 



/ 
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P has been applied up to iteration 5, then we switched to 
P' from iteration 6. Figure 0] shows the results: 
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Figure 4: Example: 2 PIDs with evolution of P to 
P'. 



The above examples are only for easy illustration. The 
gain of the distributed approach should be much clearer for 
the computation of X for large matrix P. This will be ad- 
dressed in a future paper in the context of the PageRank 
equations, on the web graph (on which the gain of such an 
approach without distributed computations is shown in [4]) 
or on the general graph (such as the PageRank extensions 
on the paper-author graph for the research publications [5]). 



6. CONCLUSION 

In this paper, we presented two asynchronous computa- 
tion schemes associated to the D-iteration approach. We 
believe that its potential is very promising and further in- 
vestigation (and implementation) for a really large P, such 
as for the PageRank matrix associated to the web graph, 
will be addressed in a future paper. 
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